Variable selection bias in regression trees with constant fits

نویسندگان

  • Yu-Shan Shih
  • Hsin-Wen Tsai
چکیده

The greedy search approach to variable selection in regression trees with constant fits is considered. At each node, the method usually compares the maximally selected statistic associated with each variable and selects the variable with the largest value to form the split. This method is shown to have selection bias, if predictor variables have different numbers of missing values and the bias can be corrected by comparing the corresponding P -values instead. Methods related to some change-point problems are used to compute the P -values and their performances are studied. keyword : change-point; maximally selected statistic; missing values; P -values

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Visualizable and Interpretable Regression Models With Good Prediction Power

Many methods can fit models with higher prediction accuracy, on average, than least squares linear regression. But the models, including linear regression, are typically impossible to interpret or visualize. We describe a tree-structured method that fits a simple but non-trivial model to each partition of the variable space. This ensures that each piece of the fitted regression function can be ...

متن کامل

Visualizable and Interpretable Regression Models With Good Prediction Power1

Many methods can fit models with higher prediction accuracy, on average, than least squares linear regression. But the models, including linear regression, are typically impossible to interpret or visualize. We describe a tree-structured method that fits a simple but non-trivial model to each partition of the variable space. This ensures that each piece of the fitted regression function can be ...

متن کامل

Regression Trees With Unbiased Variable Selection and Interaction Detection

We propose an algorithm for regression tree construction called GUIDE. It is specifically designed to eliminate variable selection bias, a problem that can undermine the reliability of inferences from a tree structure. GUIDE controls bias by employing chi-square analysis of residuals and bootstrap calibration of significance probabilities. This approach allows fast computation speed, natural ex...

متن کامل

Variable Selection Bias in Classification Trees Based on Imprecise Probabilities

Classification trees are a popular statistical tool with multiple applications. Recent advancements of traditional classification trees, such as the approach of classification trees based on imprecise probabilities by Abellán and Moral (2005), effectively address their tendency to overfitting. However, another flaw inherent in traditional classification trees is not eliminated by the imprecise ...

متن کامل

Variable Selection in Classification Trees Based on Imprecise Probabilities

Classification trees are a popular statistical tool with multiple applications. Recent advancements of traditional classification trees, such as the approach of classification trees based on imprecise probabilities by Abellán and Moral (2004), effectively address their tendency to overfitting. However, another flaw inherent in traditional classification trees is not eliminated by the imprecise ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Computational Statistics & Data Analysis

دوره 45  شماره 

صفحات  -

تاریخ انتشار 2004